Human Genomics
○ Springer Science and Business Media LLC
Preprints posted in the last 30 days, ranked by how well they match Human Genomics's content profile, based on 13 papers previously published here. The average preprint has a 0.05% match score for this journal, so anything above that is already an above-average fit.
Caieiro, D.; Faria, N. A.; Botelho, A.; Araujo, M.; Ramos, L.; Calvao, J.; Goncalo, M.; Miragaia, M.
Show abstract
Staphylococcus aureus plays a central role in the exacerbation of atopic dermatitis (AD), but the population structure and pathogenic determinants of strains colonizing AD patients remain poorly understood. It is unclear whether these strains mirror those circulating in the general community or whether specific clonal lineages are selectively adapted to the AD skin microenvironment. Data addressing this question are scarce, particularly in Portugal. In this study, we investigated the molecular epidemiology and pathogenic traits of S. aureus colonizing skin lesions in adult patients with AD in Portugal. We found that lesion-associated isolates belonged predominantly to the methicillin-susceptible S. aureus MSSA-ST398 clonal type, a lineage that is widely circulating in the Portuguese community, particularly among vulnerable populations, and that has also been implicated in severe human infections. Notably, isolates from this clonal type in AD harboured specific pathogenicity traits associated with skin barrier disruption, including hemolysin and urease production, which may contribute to their success as colonizers in AD. Our findings highlight that S. aureus colonization in AD arises from a dynamic interplay between community-level molecular epidemiology and disease-specific selective pressures. While circulating lineages provide the genetic background diversity, the AD skin microenvironment appears to shape which clones ultimately become dominant. Such an integrated perspective may help to inform future geographically tailored strategies aimed at limiting bacterial burden and preventing disease exacerbation in AD.
Kovanda, A.; Hodzic, A.; Kotnik, U.; Visnjar, T.; Podgrajsek, R.; Andjelic, A.; Jaklic, H.; Maver, A.; Lovrecic, L.; Peterlin, B.
Show abstract
STUDY QUESTION[Do structural genomic variants, that can be identified by using optical genome mapping, contribute to male infertility?] SUMMARY ANSWER[By using optical genome mapping we can identify several types of structural variants, both known and new, that may contribute to male infertility.] WHAT IS KNOWN ALREADY[Traditional approaches such as karyotyping, CFTR and chromosome Y microdeletion testing are successful in explaining clinical findings in [~]30% of MI patients, leaving the rest without a genetic diagnosis. Recent research suggests at least 265 genes may play a role in male fertility. While the assessment of the roles of copy number variants and single nucleotide variants in monogenic forms of disease in these genes is underway, much less is known about structural variants.] STUDY DESIGN, SIZE, DURATION[We performed a longitudinal case/control study on a total of 220 individuals; 88 patients with male infertility, negative for cytogenetic abnormalities using karyotyping, and molecular testing for chrY microdeletions, and CFTR gene variants, and 132 healthy male individuals that underwent optical genomic mapping for other reasons. Exclusion criteria for the control cohort were low-sperm quality and/or inclusion in IVF procedures. The study was approved by the National Medical Ethics Committee of the Republic of Slovenia (reference number: 0120-213/2022/6). Optical genome mapping was performed from an aliquot of whole blood collected for routine testing purposes at the Clinical Institute of Genomic Medicine (CIGM), UMC Ljubljana from January 2023 to November 2024.] PARTICIPANTS/MATERIALS, SETTING, METHODS[We examined structural variants in 220 participants by using optical genome mapping, which was performed with DLE-1 SP-G2 chemistry and the Saphyr instrument. The de novo assembly and Variant Annotation Pipeline were executed on Bionano Solve3.7_20221013_25 while reporting and direct visualization of structural variants was done on Bionano Access 1.7.2. All obtained variants were filtered using the Bionano Access software and in-house generated gene/regions of interest panel bed files. The first filter was applied to include variants below a population frequency of 10%, and overlapping the regions of interest. Subsequently, all variants occurring with frequency 0% in the internal manufacturer variant dataset were manually evaluated for possible involvement of the overlapping genes or regions in biological processes involved in MI. The male infertility cohort also underwent research whole exome analyses as previously reported. All results of optical genomic mapping were confirmed by an appropriate alternative method where available.] MAIN RESULTS AND THE ROLE OF CHANCE[We show that the overall number of structural variants in MI patients does not differ from that of healthy individuals. By looking in detail at genes and regions associated with MI, we identified 21 rare variants absent from controls in 25.0 % of MI patients, of which five were likely causative, and two would be missed by using traditional approaches. These variants include inversions, duplications, amplifications, deletions (e.g. SPAG1), and insertions/expansions (e.g. DMPK), that were validated using additional methods. While the remaining SV cannot be currently classified as pathogenic according to existing criteria, they open a new avenue in genetic research of MI. LARGE SCALE DATA[Variants reported in this study were deposited into ClinVar under accession numbers SUB15650956 (https://www.ncbi.nlm.nih.gov/clinvar/)] LIMITATIONS, REASONS FOR CAUTION[Technical limitations of optical genome mapping include the lack of DLE-1 labelling of centromeric and telomeric regions, the inability to detect Robertsonian translocations, the unclear exact location of smaller structural variants located between the DLE-1 labels, and unclear boundaries in case of their location in segmentally duplicated regions (this limitation is shared with other methods). The ACGM criteria of rarity are also hard to apply, as the fertility status of the individuals in healthy population databases such as GnomAD and DGV is unknown. Similarly, gene-associated phenotype and the proposed inheritance model both need to be considered as parts of the ACMG criteria, but for many candidate genes associated with MI, no model of inheritance has yet been proposed.] WIDER IMPLICATIONS OF THE FINDINGS[Currently, with the established diagnostic approaches we are able to resolve [~]30% of male infertility cases, with [~]70% of patients remaining undiagnosed. The significance of our work is in showing that rare structural variants can be identified in MI, by using optical genome mapping, opening new avenues of research of the genetics of this important contributor to human fertility.] STUDY FUNDING/COMPETING INTEREST(S)[All authors declare having no conflict of interest in regard to this research. This work was funded by the Slovenian Research and Innovation Agency (ARIS) Programme grant P3-0326: Gynecology and Reproduction: Genomics for personalized medicine] Lay summaryMale infertility affects about 5% of adult males and has complex causes, including genetic ones, such as mutations in the CFTR gene, small deletions on chromosome Y, and balanced translocations, but currently we can only find a genetic cause in [~]30% of patients. This means [~]70% of cases remain undiagnosed but potentially, they too may have a yet unknown genetic cause. Indeed, so far research has shown at least 265 genes have been proposed to play a role in male fertility. In these genes, there has so far been limited research of single nucleotide variants and of copy number variants, but many structural variants are not visible using commonly used methods in clinical genetic testing. Therefore, apart from chromosome Y microdeletions and chromosomal numerical and structural anomalies, such as balanced translocations, the role of smaller structural variants in male infertility is unknown, but based from what we know from other diseases, they also may play a role in male infertility. Optical genome mapping is a novel method for the detection of structural variants, such as balanced and unbalanced translocations, insertions, duplications, deletions, and complex structural rearrangements in a wide range of sizes. By using optical genome mapping to test a cohort of 88 infertile men and 132 healthy controls, we aimed to provide the first insights into the range of SV that may be associated with MI. We found, by using optical genome mapping, the overall number of structural variants in MI patients not to be significantly different to the control group. However, by looking at genes and regions associated with MI, we can find rare structural variants that are absent from controls in 25.0% of MI patients. These variants include inversions, duplications, amplifications, deletions (e.g. deletion in SPAG1), and insertions/expansions (e.g. in DMPK), that were validated using additional methods. Five of these variants (5.6%) were likely causative, and two would be missed by traditional approaches. While the remaining SV cannot be currently classified as pathogenic according to existing criteria, they open a new avenue in genetic research of MI.
Buianova, A. A.; Cheranev, V. V.; Shmitko, A. O.; Vasiliadis, I. A.; Ilyina, G. A.; Suchalko, O. N.; Kuznetsov, M. I.; Belova, V. A.; Korostin, D. O.
Show abstract
IntroductionAdverse drug reactions (ADRs) remain a major public health issue, and genetic factors contribute importantly to interindividual variability in drug response. Pharmacogenetic testing helps reduce ADR risk by optimizing drug selection and dosage, particularly in monogenic disorders. Material and MethodsWhole-exome sequencing of 6,739 samples from the Russian population was performed using the MGIEasy Universal DNA Library Prep Set on the DNBSEQ-G400 platform (MGI). Variants in 48 genes were examined, focusing on inherited arrhythmias (Long QT syndrome, Short QT syndrome, Timothy syndrome, Andersen-Tawil syndrome, Brugada syndrome, Atrial fibrillation, Catecholaminergic polymorphic ventricular tachycardia), enzyme deficiencies (Glucose-6-Phosphate Dehydrogenase Deficiency [G6PDD], Porphyrias), Dravet Syndrome (DS) and Malignant Hyperthermia (MH). All identified variants had been reported at least once as pathogenic (P) or likely pathogenic (LP) in ClinVar, along with those occasionally classified as variants of uncertain significance (VUS). Each variant was manually re-evaluated according to ACMG criteria. ResultsA total of 75 unique variants in 18 genes were observed in 119 individuals (1.77%), including 21 carriers and 13 women with a G6PD mutation. Of these, 46 variants were classified as P, 21 as LP, and 8 as VUS. Missense variants accounted for the largest proportion (73.33%). The most affected genes were KCNQ1 (24/119), which exhibited the highest number of unique variants (18), G6PD (20/119), SCN1A (15/119), and RYR1 (14/119). Regarding associated conditions, mutations linked to arrhythmias were found in 51 individuals, MH in 27, G6PDD in 20, DS in 15, and Porphyrias in 6. ConclusionsIncorporating genetic information on both common and rare clinically actionable variants into therapeutic decision-making has the potential to improve medication safety, reduce preventable ADRs, and enhance the effectiveness of personalized pharmacotherapy.
Karelin, A.; Brecht, I. B.; Pogoda, M.; Demidov, G.; Abele, M.; Schneider, D. T.; Aldea, D.; Etchevers, H. C.; Puig, S.; Hahn, M.; Forchhammer, S.
Show abstract
BackgroundDistinguishing benign proliferative nodules (PNs) from melanoma arising within congenital melanocytic nevi remains a major diagnostic challenge. Copy number alteration (CNA) analysis is widely used to support classification, but current criteria were developed using array comparative genomic hybridization (aCGH). The performance of alternative platforms such as shallow whole-genome sequencing (sWGS) and methylation arrays in this setting is poorly defined. ObjectivesThe objective of this study is to compare CNA profiles obtained from aCGH, sWGS, and methylation arrays in atypical nodules arising within congenital nevi, and to correlate these molecular findings with clinical outcomes. MethodsSixteen samples from fourteen patients were retrospectively analyzed using all three platforms. CNAs were cataloged, concordance across methods was quantified using the Jaccard index, and molecular classifications were compared. Clinical follow-up was reviewed to provide clinical context. ResultsaCGH detected 39 CNAs, sWGS 60, and methylation profiling 66. Concordance was highest between sWGS and methylation (mean Jaccard 0.67), followed by aCGH versus sWGS (0.64) and aCGH versus methylation (0.49). Cases with high aneuploidy demonstrated strong cross-platform agreement, whereas low-burden lesions exhibited greater variability between methods. Divergent molecular classifications were observed in six cases. ConclusionsWhile all methods reliably detect broad chromosomal changes, sWGS and methylation arrays identify many additional focal CNAs that may not align with CGH-based diagnostic criteria. Until platform-specific thresholds are established, aCGH remains the most conservative and clinically validated approach for evaluating proliferative nodules in congenital nevi. SIGNIFICANCEAccurate molecular classification of melanocytic proliferations in congenital nevi is essential but challenging, particularly in patients with multiple proliferative nodules. This study provides the first systematic comparison of aCGH, sWGS, and methylation-based CNA profiling in this setting. We show that higher-resolution platforms detect substantially more focal aberrations, which can lead to discordant and potentially overcalled malignancy assessments when applying CGH-derived criteria. Our findings highlight the need for platform-adapted diagnostic frameworks and support continued use of CGH as the most conservative and clinically validated method for risk stratification. GRAPHICAL ABSTRACT O_FIG O_LINKSMALLFIG WIDTH=118 HEIGHT=200 SRC="FIGDIR/small/26347388v1_ufig1.gif" ALT="Figure 1"> View larger version (27K): org.highwire.dtl.DTLVardef@1df3551org.highwire.dtl.DTLVardef@1256e50org.highwire.dtl.DTLVardef@6d8660org.highwire.dtl.DTLVardef@911b4f_HPS_FORMAT_FIGEXP M_FIG C_FIG
Boquett, J. A.; Lin, S. Y.-T.; House, J. S.; Ahn, K.; Suseno, R.; BakenRa, A.; Guthrie, K.; Wright, M.; Motsinger-Reif, A.; Maiers, M.; Hollenbach, J. A.
Show abstract
BackgroundVariation in the HLA loci, located on human chromosome 6p, has been associated with hundreds of diseases and conditions. However, high levels of polymorphism that characterize the HLA system, coupled with generally modest effect sizes for most phenotypes, necessitate relatively large sample sizes to power association studies; meanwhile, high resolution HLA genotyping remains relatively resource intensive. These constraints limit identification of novel associations. While phenome-wide association studies (PheWAS) in the context of large registries with available electronic health records (EHR) have revealed new insights into the role of HLA in disease, many common health conditions are poorly represented in EHR due to the temporal nature of their occurrence or general underreporting. Further, these studies have generally been conducted with HLA genotyping data imputed from microarrays, rather than direct measurement of high-resolution genotypes. ObjectiveTo overcome these limitations and reveal novel HLA associations we undertook a PheWAS in many previously understudied health conditions. MethodsWe queried over 300 hundred conditions, diseases and traits from 70,724 subjects registered with NMDP with available high-resolution HLA genotyping (HLA-A, HLA-B, HLA-C, HLA-DRB1, and HLA-DQB1). After stratifying according to ancestry, we performed a logistic regression analysis adjusting for sex and age for HLA-phenotype association. ResultsWe identified 48 significant HLA associations across ancestry groups, confirming several known associations and uncovered fifteen novel associations. Most novel associations pertained to common infectious or allergic phenotypes that often go under-reported in the EHR. Of particular translational importance, we identified a previously undetected yet very strong association between HLA-DRB1*04:01 and sensitivity to cefaclor, a specific class of cephalosporin (OR = 3.74, p-value 5.10E-28). Molecular docking simulations predict cefaclor binding in the P4 pocket of HLA-DRB1*04:01, with substantially greater affinity than non-associated antibiotics, including other cephalosporins. This pharmacogenomic signal highlights an opportunity for risk stratification and targeted prevention of adverse drug reactions. Other novel associations found, such as susceptibility to genital warts (HPV) and allergic rhinitis, reveals new insights into the role of specific HLA alleles in immune-mediated disease. The vast majority of these novel associations were replicated in the independent All of Us cohort, confirming the validity of this approach. ConclusionCollectively, our findings demonstrate the value of integrating population-scale, high-resolution HLA genotypes with phenotyping beyond the EHR to reveal immunogenetic influences on common health outcomes. They also point to immediate translational avenues - particularly for drug hypersensitivity - while motivating future functional studies and prospective clinical validation to refine mechanistic understanding and clinical utility.
Eisenhart, C. E.; Brickey, R.; Mewton, J.
Show abstract
The Clinical Pharmacogenetics Implementation Consortium (CPIC) bases its drug-gene recommendations on the assignment of star alleles, which map known genotypes to defined functional categories and corresponding drug dosage guidelines. The star allele framework, first proposed in 1996 for the CYP gene family and later formalized with CPICs establishment in 2010 [1, 2], remains foundational to pharmacogenomics. However, this system has notable limitations. Its dependence on a restricted set of benchmark single nucleotide polymorphisms (SNPs) excludes rare or novel pathogenic variants that can invalidate a star allele call and lead to incorrect dosing recommendations. Furthermore, nearby non-pathogenic variants can interfere with haplotype interpretation, introducing additional risk of misclassification. To address these gaps, we developed PHARMWATCH, a multistep pharmacogenomics workflow for comprehensive variant analysis, allele tracking, and contextual interpretation. PHARMWATCH incorporates two algorithmic safeguards designed to identify genomic alterations that compromise star allele accuracy: (1) de novo germline variant screening using the ACMG-based BIAS-2015 classifier and (2) variant interpretation in context (VIIC) to validate the functional integrity of star allele-defining SNPs [3]. Together, these layers enhance the reliability of pharmacogenomic reporting, enabling safe, automated, and review-ready recommendations that extend beyond the constraints of traditional star allele-based approaches.
Karim, M. A.; Hukku, A.; Ariano, B.; Holzinger, E.; Tsepilov, Y.; Hayhurst, J.; Buniello, A.; McDonagh, E. M.; Castel, S. E.; Nelson, M. R.; Maranville, J.; Yerges-Armstrong, L.; Ghoussaini, M.
Show abstract
We assessed the impact of plasma protein quantitative trait loci (pQTL) on therapeutic hypotheses backed by human genetic evidence. We show that pQTL-supported target-indication pairs were 4.7 times more likely to advance from Phase I to launch, compared to a 2.6-fold increase observed only with human genetic evidence. Moreover, pQTL-based enrichment was prominent in druggable protein families which had limited enrichment from human genetic evidence alone.
Kara, M.; Gungor, A. F.; Kuday, S. E.; Ozcelik, O.; Ozden, F.
Show abstract
Genetic diagnosis remains a formidable challenge characterized by a diagnostic odyssey that spans years, with over half of rare disease patients remaining undiagnosed affecting more than 300 million people on earth. Clinicians must navigate through thousands of candidate variants against a noisy and fragmented literature landscape, a task that overwhelms human cognitive capacity and conventional decision-making approaches. Recent advances in agentic artificial intelligence systems have demonstrated superior performance in complex, multi-step reasoning tasks by systematically evaluating vast amounts of information, breaking down problems into manageable components, and adapting dynamically to new evidence. These capabilities align precisely with the requirements of genetic variant prioritization. Here we present DAVP (Deep Agentic Variant Prioritisation), a hierarchical agentic AI system that represents a major step forward in genetic diagnosis through patient-specific variant evaluation. Unlike traditional approaches that apply generic pathogenicity scores, DAVP evaluates each variant within the full context of the patients clinical presentation, phenotypic profile, and genomic background. The system comprises three interconnected algorithmic components: prelimin8, a gene pre-screening algorithm that rapidly filters the genomic search space; inGeneTopMatch, a semantic knowledge graph algorithm that captures complex gene-phenotype-disease relationships; and elimin8, an in-context learning prioritization algorithm that dynamically ranks variants through iterative knowledge sorting and evidence synthesis. We conducted comprehensive benchmarks measuring diagnostic cumulative distribution function (CDF) recall based on top-k variant recommendations using simulation cases constructed with 1000 Genomes as healthy background genomes and variants from ClinVar as positive controls. DAVP demonstrates strong diagnostic performance superior to expert genetic clinicians while operating at orders of magnitude greater speed and scale. Our results demonstrate that agentic AI systems can transform rare disease diagnostics by combining the systematic evaluation capabilities of artificial intelligence with the nuanced clinical reasoning required for complex genetic diagnosis. This work lays the foundation for a new paradigm in AI-driven genetic medicine that could accelerate diagnosis, reduce healthcare costs, and improve patient outcomes worldwide. The source code and data to reproduce this work are available at https://github.com/Muti-Kara/davp.
Sebastian, C.; Yu, M.; Jin, J.
Show abstract
Polygenic risk scores (PRSs) have emerged as a valuable tool for genetic risk prediction and stratification in human diseases. Over the past decade, extensive methodological efforts have focused on improving the predictive power of PRS, leading to the development of numerous methods for PRS construction. Benchmarking these various methods thus becomes an essential task that is crucial for guiding future PRS applications. While studies have benchmarked subsets of these methods on specific phenotypes and cohorts, the resulting evidence remains fragmented, with a lack of work that comprehensively assess the relative performance of the various PRS methods. In this study, we addressed this gap by systematically constructing a PRS method benchmarking database synthesizing published results from 2009 to 2025. We applied a spectral ranking inference framework with uncertainty quantification to rank 14 PRS methods that had been adequately compared against each other in the literature. We constructed rankings using two complementary sources: original method-development studies and applications/benchmarking studies. While the highest-ranked methods (LDpred2 and AnnoPred) and the lowest-ranked method (C+T) were consistently identified from both sources, the relative ordering of most methods showed moderate variability. We further constructed phenotype-specific rankings, providing more detailed insights into the robustness and phenotype-specific strengths of individual methods. Collectively, the overall and phenotype-specific rankings of the PRS methods, along with the curated benchmarking data from the literature, provide a dynamic and practical reference database that can continuingly be updated with emerging new PRS methods and published benchmarking results to guide future PRS applications.
Dobbins, S. E.; Forner-Cordero, I.; Amigo Moreno, R.; Southgate, L.; Hobbs, K.; Moy, R.; Adjei, M.; Muntane, G.; Vilella, E.; Martorell, L.; Gordon, K.; Ostergaard, P. E.; Pittman, A.
Show abstract
Lipoedema is a chronic adipose tissue disorder mainly affecting women with excess subcutaneous fat deposition on the lower limbs, associated with pain and tenderness. There is often a family history of lipoedema, suggesting a genetic origin, but the contribution of genetics is not well studied. We conducted a genome-wide association study (GWAS) for this disorder in a clinically ascertained cohort from Spain and performed a meta-analysis with the UK lipoedema cohort GWAS. We then used the results of this study as a replication of the inferred UK Biobank "lipoedema phenotype" study. Whilst our meta-analysis alone did not identify any genome-wide significant associations, our clinical cohorts provide support for three loci identified through the UKBB study: the chr2q24.3 GRB14-COBLL1 locus (rs6753142, PMETA=1.64x10-6), chr6p21.1 VEGFA locus (rs4711750, PMETA=8.99x10-7) and the chr5q11.2 ANKRD55-MAP3K1 locus (rs3936510, PMETA=1.67x10-5). We identify numerous rare SNPs with strong association signals in our meta-analysis (P<1x10-6) with support in both UK and Spanish datasets, three of which also show nominal support in the UKBB (P<0.05). These findings provide a starting point towards understanding the genetic basis of clinical lipoedema and demonstrate the utility of the interplay of large-scale biobanks genetic data and clinically ascertained cohorts to elucidate the genetic architecture of lipoedema.
Buianova, A. A.; Cheranev, V. V.; Shmitko, A. O.; Vasiliadis, I. A.; Ilyina, G. A.; Suchalko, O. N.; Kuznetsov, M. I.; Belova, V. A.; Korostin, D. O.
Show abstract
BackgroundPersonalized pharmacotherapy requires systematic consideration of genetic factors influencing drug efficacy and safety. The accumulation of large-scale whole-exome sequencing (WES) data provides an opportunity to assess population frequencies of clinically significant pharmacogenetic variants; however, the diagnostic applicability of exome data for pharmacogenomics remains insufficiently studied. Materials and MethodsA retrospective analysis of 6,102 anonymized sequencing datasets obtained between 2020 and 2025 was performed using the DNBSEQ-G400 (MGI) platform and Agilent SureSelect Human All Exon v6/v7/v8 enrichment kits. SNV and indel detection, CNV analysis, high-resolution HLA typing, and diplotype assignment for key pharmacogenes were conducted. Pharmacogenomic annotations were derived from PharmGKB (levels of evidence 1A-2B), CPIC, and PharmVar. Additionally, WES limitations and the feasibility of imputing non-coding pharmacogenetic variants were evaluated. ResultsPopulation frequencies of alleles and metabolic phenotypes were determined for 13 Very Important Pharmacogenes (VIPs), along with the distribution of HLA class I and II alleles. The highest allelic and phenotypic variability was observed in CYP family genes, particularly CYP2D6, CYP2C19, and CYP2B6. A total of 663 pharmacogenomic annotations were identified, predominantly related to drug metabolism (50.38%) and toxicity (29.56%), including psychotropic agents, anticoagulants, statins, opioid analgesics, antineoplastic agents, and immunosuppressants. At least 32 drugs require pharmacogenetic testing based on variants located in non-coding regions, as well as accurate CYP2D6 copy number determination. Linkage disequilibrium analysis demonstrated the inability to reliably impute most non-coding pharmacogenetic variants from WES data. ConclusionThese findings represent one of the largest reference assessments to date of pharmacogenetically significant variant and HLA allele frequencies in the Russian population. The results confirm the utility of WES for population pharmacogenomic screening while simultaneously highlighting its fundamental limitations and the need for alternative genetic diagnostic methods in selected cases.
Mandla, R.; Li, X.; Shi, Z.; Abramowitz, S.; Lapinska, S.; Penn Medicine Biobank, ; Levin, M. G.; Damrauer, S. M.; Pasaniuc, B.
Show abstract
Polygenic scores (PGS) have emerged as an important tool for genetic risk prediction in medicine to identify individuals at high-risk for disease. A major limitation in their implementation is the apparent disagreement among scores for the same individual decreasing their interpretability and utility in clinical settings. Here we show that the poor agreement across PGSes for type 2 diabetes (T2D) is fully explained by statistical uncertainty in PGS-based prediction; individual-level uncertainty estimates from a single PGS explain the variability across existing PGSes. We provide an approach for the selection of high-risk individuals that incorporates measures of uncertainty and show that individuals with high confidence based on their PGS uncertainty have higher risk agreement across existing PGS and are more likely to develop T2D than high-risk individuals based on only point estimates of PGS. Together, these findings shed light on the factors underlying a roadblock in PGS implementation and underscore the need to incorporate uncertainty in PGS-based predictions.
Neurgaonkar, P.; Dierolf, M.; O'Gorman, L.; Remmele, C.; Schaeffer, J.; Popp, I.; Borst, A.; Rost, S.; Ankenbrand, M.; Kratz, C.; Bergmann, A.; Kalb, R.; Yu, J.
Show abstract
MotivationFanconi anemia (FA) is a rare disease mainly caused by biallelic pathogenic variants, including structural variants such as large deletions and insertions in FA genes. Currently, variant detection is based on short-read sequencing and probe-based approaches. However, determining the exact genomic breakpoint or achieving allelic discrimination remains challenging. Nanopore-based long-read sequencing enables a comprehensive detection of FA variants, but a unified bioinformatic analysis platform for these data is missing. ResultsWe present FA-NIVA (Fanconi anemia - Nanopore Indel and Variant Analysis), an automated and adaptable analysis workflow tailored for Nanopore-based long-read sequencing data in FA genetic analysis. FA-NIVA integrates state-of-the-art tools to comprehensively detect both single nucleotide variants (SNVs) and structural variants (SVs). Our analysis platform enhances genotyping accuracy for biallelic variants by a joint SNV-SV based phasing in FA associated genes. Built within the Nextflow ecosystem and powered by containerized Docker images, FA-NIVA ensures reproducibility, flexibility, scalability and transparency across different computing environments. Together, FA-NIVA provides a robust end-to-end solution for the automated analysis of SVs and SNVs and high-resolution phasing analysis in FA genes, enabling an accurate and efficient pipeline for genetic analysis. AvailabilityFA-NIVA is available on GitHub at: https://github.com/UKWgenommedizin/FA-NIVA.
Liu, S.; Szabo, A.; Zarouchlioti, C.; Bhattacharyya, N.; Nguyen, Q.; Abreu Costa, M.; Luben, R.; Dudakova, L.; Skalicka, P.; Horak, M.; Khawaja, A.; Pontikos, N.; Muthusamy, K.; Tuft, S.; Liskova, P.; Davidson, A.
Show abstract
PurposeFuchs endothelial corneal dystrophy (FECD) is a common corneal disease and a leading indication for endothelial keratoplasty (EK). Although CTG18.1 repeat expansion is a major genetic risk factor, the contribution of polygenic background to disease progression remains unclear. We evaluated whether combining CTG18.1 expansion status with a FECD-specific polygenic risk score (PRS) enables genomic prediction of progression to EK. MethodsWe retrospectively analysed 589 individuals with FECD from two European centers, with replication in an independent cohort of 185 individuals. Association of CTG18.1 expansion ([≥]50 repeats) and PRS with time to EK were evaluated using Cox models adjusted for sex and ancestry. ResultsExpansion-positive status was associated with earlier EK (HR 2.30; 95% CI 1.62- 3.26; P<.001). Addition of PRS improved prediction (C-index 0.614 vs 0.602; P=.014). Each 1-SD increase in PRS was associated with earlier EK (HR 1.16; 95% CI 1.03-1.30; P=.015), with replication in the validation cohort (HR 1.42; 95% CI 1.15-1.75; P=.001). ConclusionIntegration of monogenic and polygenic risk enables genomic prediction of FECD progression, supporting clinical genomic risk stratification to inform individualized monitoring and timing of intervention.
Li, Y.; Cornejo-Sanchez, D. M.; Dong, R.; Naderi, E.; Wang, G. T.; Leal, S. M.; DeWan, A. T.
Show abstract
The genetic relationship between asthma and lung function may be dependent on age-at-onset (AAO) of asthma. We investigated whether the shared genetics between asthma AAO and lung function is dependent on AAO. Asthma cases from UK Biobank were subset according to their AAO and genetic correlation was used to obtain genetically homogeneous groups, i.e., [≤]20 (LT20), 20-40, and >40 (GT40) years. Association analysis and fine-mapping were performed to identify shared genetics between AAO groups and lung function. Mediation and quantitative trait locus (QTL) analyses were performed to identify mechanisms underlying shared genetic associations. Chr5, chr6, chr12, and chr17 each had one region that displayed a cross-phenotype replicated association with at least one AAO group and lung function. Overlapping credible sets obtained from fine-mapping were observed on chr5 and chr6. Mediation analyses demonstrated that for each region the proportion mediated through asthma on lung function was larger for asthma LT20 compared to 20-40 and GT40 suggesting that their effects on lung function were more strongly driven by this association. Tissue-specific QTL analysis revealed shared etiology on chr5 may be acting through SLC22A5 and C5orf56 which might play an important role in decreased lung function among individuals with earlier-onset asthma.
Matton, C.; Van De Velde, J.; De Bruyne, M.; Van De Sompele, S.; Hooghe, S.; Syryn, H.; Bauwens, M.; D'haene, E.; Dheedene, A.; Cools, M.; Komatsuzaki, S.; Preizner-Rzucidlo, E.; Ross, A.; Armstrong, C.; Watkins, W.; Shelling, A.; Vincent, A. L.; Cassiman, C.; Vermeer, S.; Bunyan, D. J.; Verdin, H.; De Baere, E.
Show abstract
Heterozygous FOXL2 (non-)coding sequence and structural variants (SVs) lead to blepharophimosis, ptosis and epicanthus inversus syndrome (BPES), a rare, autosomal dominant developmental disorder characterized by a completely penetrant eyelid malformation and incompletely penetrant primary ovarian insufficiency (POI). We collected variants from our in-house database, generated via clinical genetic testing and downstream research testing in the Center for Medical Genetics Ghent, Belgium (2001-2024), and via literature and other resources in the same period. All retrieved variants were categorized using ACMG/AMP classifications to increase the knowledge of pathogenicity. We collected 413 unique genetic defects of the FOXL2 region, including 76 novel variants, in 864 index patients. Of these, 87% of patients were identified with a coding FOXL2 sequence variant. The polyalanine tract is a known mutational hotspot of FOXL2, illustrated here by the high percentage of pathogenic polyalanine expansions (24%). Furthermore, the molecular spectrum in typical BPES index patients is characterized by 8% coding deletions and 3% deletions located up- and downstream of FOXL2. The remaining 2% carry translocations along with chromosomal rearrangements of 3q23. This uniform and structured reclassification, incorporating the largest dataset of variants implicated in FOXL2-associated disease so far, will improve both the diagnosis as well as genetic counselling for individuals with BPES.
Guler, F.; Goksuluk, D.; Xu, M.; Choudhary, G.; agraz, m.
Show abstract
Applying deep learning models to RNA-Seq data poses substantial challenges, primarily due to the high dimensionality of the data and the limited sample sizes. To address these issues, this study introduces an advanced deep learning pipeline that integrates feature engineering with data augmentation. The engineering application focuses on biomedical engineering, specifically the classification of RNA-Seq datasets for disease diagnosis. The proposed framework was initially validated on synthetic datasets generated from Naive Bayes, where MLP-based augmentation yielded a notable improvement in predictive performance. Building on this foundation, we applied the approach to chromophobe renal cell carcinoma (KICH) RNA-Seq data from The Cancer Genome Atlas (TCGA). Following standard preprocessing steps normalization, transformation, and dimensionality reduction, the analysis concentrated on three main aspects: augmentation strategies, preprocessing methods, and explainable AI (XAI) techniques in relation to classification outcomes. Feature selection was performed through PCA, Boruta, and RF-based methods. Three augmentation strategies linear interpolation, SMOTE, and MixUp were evaluated. To maintain methodological rigor, augmentation was applied exclusively to the training set, while the test set was held out for unbiased evaluation. Within this framework, we conducted a comparative assessment of multiple deep learning architectures, including MLP, GNN, and the recently proposed Kolmogorov-Arnold networks (KAN). The GNN achieved the highest classification accuracy (99.47%) when trained with MixUp augmentation combined with RF feature selection, and achieved the best F1 score (0.9948). Consequently, the GNN-based XAI framework was applied to the RF dataset enriched with MixUp. XAI analyses identified the top 20 most influential genes, such as HNF4A, DACH2, MAPK15, and NAT2, which played the greatest role in classification, thereby confirming the biological plausibility of the model outputs. To further validate model robustness, cervical cancer and Alzheimers RNA-Seq datasets were also tested, yielding consistent and reliable results. Overall, the findings highlight the value of incorporating data augmentation into deep learning models for RNA-Seq analysis, not only to improve predictive performance but also to enhance biological interpretability through explainable AI approaches.
Orkild, M. R.; Dybdahl, K. L.; Duun Rohde, P. D.
Show abstract
Inflammatory bowel disease (IBD) frequently co-occurs with immune-mediated and metabolic disorders, but whether these associations reflect shared genetics or causal effects remains unclear. We performed two-sample Mendelian randomization (MR) using large-scale genome-wide association study (GWAS) summary statistics to investigate potential causal effects of immune-mediated diseases and lifestyle traits on IBD, Crohns disease (CD), and ulcerative colitis (UC). SNP-based heritability and genetic correlations were estimated to contextualize findings. Following false discovery rate correction, genetically predicted psoriasis was positively associated with IBD (OR 1.15), CD (OR 1.23), and UC (OR 1.10), with the strongest effect observed for CD. Genetically predicted type 2 diabetes mellitus (T2DM) showed a modest inverse association with UC (OR 0.88). No lifestyle-related traits remained significant after correction. Sensitivity analyses indicated heterogeneity across instruments and evidence of directional pleiotropy in selected models, whereas no pleiotropy was detected for the T2DM-UC association. These findings support a role of psoriasis-related immune pathways in IBD susceptibility and suggest a potential inverse association between genetic liability to T2DM and UC.
Lapinska, S.; Li, X.; Mandla, R.; Shi, Z.; Tozzo, V.; Flynn-Carroll, A.; Ritchie, M. D.; Rader, D. J.; Penn Medicine Biobank, ; Pasaniuc, B.
Show abstract
The Genome Informed Risk Assessment (GIRA) report from eMERGE has become a standard approach to implement genomic precision medicine at scale. Here, we assess GIRAs utility and impact in a health care system independent of eMERGE, focusing on 9 adult conditions using the Penn Medicine Biobank (PMBB, n=48,279). We find a large number of patients - 50.1% (n=24,185) - were deemed by GIRA as high-risk for at least one of the 9 conditions with 30.4% (n=14,676) due to polygenic and/or monogenic risk. Stratifying by ancestry revealed significant differences in high-risk proportions, with higher rates in African/African American (AFR) (56.6% vs. 50.1%, p=7.43x10-36) and lower rates in East (42.0%) and South Asian (40.0%). Increased high-risk rates were observed in the lowest quartile of social deprivation index, highlighting the influence of environmental factors and access to care on GIRAs utility. GIRA was a good predictor of prevalent cases (in-line with the eMERGE GIRA reported results); incident case prediction was substantially attenuated for 5 of the 9 conditions (e.g., OR of 2.36 vs. HR of 1.31 for atrial fibrillation (AFIB)). We find demographic compositions of high-risk patients differed from the incident cases for some of the conditions; for example, high-risk for AFIB individuals where enriched for European ancestries in contrast with incident AFIB cases that were enriched for AFR ancestries. Overall, our results show the accuracy of GIRA as a biomarker to stratify high-risk patients for precision medicine and highlight implementation challenges in its impact on the health system if implemented at scale.
Fragoso-Bargas, N.; Escarcega-Castro, R. V.; Quintal-Ortiz, I.; Vera-Gamboa, L.; Valencia-Pacheco, G.; Valadez-Gonzalez, N.
Show abstract
Type 2 diabetes (T2D) affects 11.1% of the global population, underscoring the need for biomarkers that inform treatment response and glycemic outcomes. We evaluated the association between the FTO variant rs9939609-A and glycemic control in a Mexican population. A total of 174 individuals living with T2D from Merida and Sisal, Yucatan, were included, of whom 85% were receiving oral hypoglycemic agents as main treatment. Glycemic control was defined cross-sectionally as good ([≤]130 mg/dL, n=63) or poor (>130 mg/dL, n= 111) with fasting glucose. Linear mixed models incorporating relevant covariates and a family random intercept were used. Effect size estimates were transformed to logit odds ratios. After adjustment for age, sex, BMI, years with T2D, and treatment, we observed a significant association in the additive (OR = 1.15 [1.003-1.31]) and recessive (OR = 1.51 [1.03-2.23]) models. To conclude, rs9939609-A may be associated with poorer glycemic control despite pharmacologic therapy.